PLOS Genetics
Top medRxiv preprints most likely to be published in this journal, ranked by match strength.
Show abstract
Polygenic scores (PGS) offer moderate to high prediction accuracy for complex traits, but most are developed in European ancestry cohorts, reducing their performance in populations of other ancestries. This study aimed to improve standing height prediction, a heritable and ancestry-influenced trait, in an admixed Latino cohort (HCHS/SOL) by modeling ancestry using principal components (PCs) alongside PGS. SNPs were selected from a large European ancestry GWAS using various p-value thresholds, an...
Show abstract
Available large-scale GWAS summary datasets predominantly stem from European populations, while sample sizes for other ethnicities, notably Central/South Asian, East Asian, African, Hispanic, etc. remain comparatively limited, which induces the low precision of causal effect estimation within these ethnicities using Mendelian Randomization (MR). In this paper, we propose a Trans-ethnic MR method called TEMR to improve statistical power and estimation precision of MR in the target population usin...
Show abstract
BackgroundCurrent form of genome-wide association studies (GWAS) is inadequate to accurately explain the genetics of complex traits due to the lack of sufficient statistical power. It explores each variant individually, but current studies show that multiple variants with varying effect sizes actually act in a concerted way to develop a complex disease. To address this issue, we have developed an algorithmic framework that can effectively solve the multi-locus problem in GWAS with a very high le...
Show abstract
BackgroundAdipose tissue androgen turnover, dictated at least in part by the enzymes AKR1C2 and AKR1C3, has been linked to abdominal obesity. Recently, we investigated a single-nucleotide polymorphism (SNP) named rs28571858, that might increase AKR1C2 and AKR1C3 expression in human adipose tissue. Here, we studied the impact of rs28571848 on adipose tissue function and cardiometabolic health in bariatric surgery candidates. MethodsWe genotyped a sample of 2776 bariatric surgery candidates and r...
Show abstract
The Phenome-wide association studies (PheWAS) have become widely used for efficient, high-throughput evaluation of relationship between a genetic factor and a large number of disease phenotypes, typically extracted from a DNA biobank linked with electronic medical records (EMR). Phecodes, billing code-derived disease case-control status, are usually used as outcome variables in PheWAS and logistic regression has been the standard choice of analysis method. Since the clinical diagnoses in EMR are...
Show abstract
We propose an efficient method to generate the summary statistics for set-based gene-environment interaction tests, as well as a meta-analysis approach that aggregates the summary statistics across different studies, which can be applied to large biobank-scale sequencing studies with related samples. Simulations showed that meta-analysis is numerically concordant with the equivalent pooled analysis using individual-level data. Moreover, meta-analysis accommodates heterogeneity between studies an...
Show abstract
Understanding the complex causal relationships among major clinical outcomes and the causal interplay among multiple organs remains a significant challenge. By using imaging phenotypes, we can characterize the functional and structural architecture of major human organs. Mendelian randomization (MR) provides a valuable framework for inferring causality by leveraging genetic variants as instrumental variables. In this study, we conducted a systematic multi-organ MR analysis involving 402 imaging ...
Show abstract
Complex diseases share heritable components which can be leveraged to identify drug targets with low side effect or high repurposing potential, but current methods cannot efficiently make these inferences at scale using public data. We introduce a Bayesian model to estimate the polygenic structure of a trait using GWAS summary data (BPACT). Across 32 complex traits, we estimated that 69.5 to 97.5% of disease-associated druggable genes are shared between multiple traits. We observed that targetin...
Show abstract
1.Accurate disease risk stratification can lead to more precise and personalized prevention and treatment of diseases. As an important component to disease risk, genetic risk factors can be utilized as an early and stable predictor for disease onset. Recently, the polygenic risk score (PRS) method has combined the effects from hundreds to millions of single nucleotide polymorphisms (SNPs) into a score that can be used for genetic risk stratification. However, current PRS approaches only utilize ...
Show abstract
Several gene-based tests, e.g., sequence kernel association test, have been developed for association testing of rare single nucleotide variants (SNVs) in genomic regions with disease traits. A common limitation of these aggregate methods is their inability to discriminate potentially causal variants from null variants within the tested regions. We propose a novel clustering method to classify rare variants into null and signal variant groups using summary statistics from the gene-based tests ba...
Show abstract
We conducted a comprehensive genetic investigation of obesity in a cohort of 93,673 Korean individuals, categorized by both body mass index and waist circumference using Korean-specific and international criteria. To explore the genetic architecture of obesity and its comorbidities, we performed genome-wide association studies and constructed polygenic risk scores (PRSs) using both conventional single trait and advanced multiple-trait models, including the PRSsum approach. Our analyses identifi...
Show abstract
Polygenic risk scores (PRSs) are promising tools for advancing precision medicine. However, existing PRS construction methods rely on static summary statistics derived from genome-wide association studies (GWASs), which are often updated at lengthy intervals. As genetic data and health outcomes are continuously being generated at an ever-increasing pace, the current PRS training and deployment paradigm is suboptimal in maximizing the prediction accuracy of PRSs for incoming patients in healthcar...
Show abstract
Optical Coherence Tomography (OCT) enables non-invasive imaging of the retina and is often used to diagnose and manage multiple ophthalmic diseases including glaucoma. We present the first large-scale quantitative genome-wide association study of inner retinal morphology using phenotypes derived from OCT images of 31,434 UK Biobank participants. We identify 46 loci associated with thickness of the retinal nerve fibre layer or ganglion cell inner plexiform layer. Only one of these loci has previo...
Show abstract
1Long-read sequencing (LRS) enables variant calling of high-quality structural variants (SVs). Genotypers of SVs utilize these precise call sets to increase the recall and precision of genotyping in short-read sequencing (SRS) samples. With the extensive growth in availabilty of SRS datasets in recent years, we should be able to calculate accurate population allele frequencies of SV. However, reprocessing hundreds of terabytes of raw SRS data to genotype new variants is impractical for populatio...
Show abstract
Withdrawal statementThis manuscript has been withdrawn by medRxiv following a formal request by the QIMR Berghofer Medical Research Institute Research Integrity Office owing to lack of author consent.
Show abstract
Linear mixed models (LMMs) are widely used in gene-environment interaction (GEI) studies to account for population structure and relatedness. However, genome-wide GEI tests using LMMs are computationally intensive, and model-based tests can yield inflated type I error rates when environmental main effects are misspecified. While robust inference methods exist for unrelated samples, challenges remain for related individuals. A common workaround is a two-step approach that first adjusts for relate...
Show abstract
Groups of complex diseases, such as coronary heart diseases, neuropsychiatric disorders, and cancers, often display overlapping clinical symptoms and pharmacological treatments. The shared associations of genetic variants across diseases has the potential to explain their underlying biological processes, but this remains poorly understood. To address this, we model the matrix of summary statistics of trait-associated genetic variants as the sum of a low-rank component - representing shared biolo...
Show abstract
Linear mixed models (LMMs) have been widely used in genome-wide association studies (GWAS) to control for population stratification and cryptic relatedness. Unfortunately, estimating LMM parameters is computationally expensive, necessitating large-scale matrix operations to build the genetic relatedness matrix (GRM). Over the past 25 years, Randomized Linear Algebra has provided alternative approaches to such matrix operations by leveraging matrix sketching, which often results in provably accur...
Show abstract
Mendelian Randomisation Egger regression (MR-Egger) is a popular method for causal inference using single-nucleotide polymorphisms (SNPs) as instrumental variables. It allows all SNPs to have direct pleiotropic effects on the outcome, provided that those effects are independent of the effects on the exposure, known as the InSIDE assumption. However, the results of MR-Egger, and the InSIDE assumption itself, are sensitive to which allele is coded as the effect allele for each SNP. A pragmatic con...
Show abstract
Obesity is a major risk factor for COVID-19 severity; however, the mechanisms underlying this relationship are not fully understood. Since obesity influences the plasma proteome, we sought to identify circulating proteins mediating the effects of obesity on COVID-19 severity in humans. Here, we screened 4,907 plasma proteins to identify proteins influenced by body mass index (BMI) using Mendelian randomization (MR). This yielded 1,216 proteins, whose effect on COVID-19 severity was assessed, aga...